Non-invasively identifying candidates of active surveillance for prostate cancer using magnetic resonance imaging radiomics

Active surveillance (AS) is the primary strategy for managing patients with low or favorable-intermediate risk prostate cancer (PCa). Identifying patients who may benefit from AS relies on unpleasant prostate biopsies, which entail the risk of bleeding and infection. In the current study, we aimed to develop a radiomics model based on prostate magnetic resonance images to identify AS candidates non-invasively. A total of 956 PCa patients with complete biopsy reports from six hospitals were included in the current multicenter retrospective study. The National Comprehensive Cancer Network (NCCN) guidelines were used as reference standards to determine the AS candidacy. To discriminate between AS and non-AS candidates, five radiomics models (i.e., eXtreme Gradient Boosting (XGBoost) AS classifier (XGB-AS), logistic regression (LR) AS classifier, random forest (RF) AS classifier, adaptive boosting (AdaBoost) AS classifier, and decision tree (DT) AS classifier) were developed and externally validated using a three-fold cross-center validation based on five classifiers: XGBoost, LR, RF, AdaBoost, and DT. Area under the receiver operating characteristic curve (AUC), accuracy (ACC), sensitivity (SEN), and specificity (SPE) were calculated to evaluate the performance of these models. XGB-AS exhibited an average of AUC of 0.803, ACC of 0.693, SEN of 0.668, and SPE of 0.841, showing a better comprehensive performance than those of the other included radiomic models. Additionally, the XGB-AS model also presented a promising performance for identifying AS candidates from the intermediate-risk cases and the ambiguous cases with diagnostic discordance between the NCCN guidelines and the Prostate Imaging-Reporting and Data System assessment. These results suggest that the XGB-AS model has the potential to help identify patients who are suitable for AS and allow non-invasive monitoring of patients on AS, thereby reducing the number of annual biopsies and the associated risks of bleeding and infection. Supplementary Information The online version contains supplementary material available at 10.1186/s42492-024-00167-6.


Introduction
Early detection and treatment can effectively reduce prostate cancer (PCa) mortality [1].However, for some patients diagnosed early, PCa may not pose an immediate threat to health throughout their lifetime.Thus, immediate treatment may not benefit these patients but may result in side effects (i.e., sexual dysfunction, urinary dysfunction, and fatigue) that diminish the quality of life [2,3].
Active surveillance (AS) refers to regular monitoring of PCa progression, during which curative treatment is administered once PCa evolves into a high-risk tumor [4].The primary aim of AS is to delay or avoid unnecessary treatment and its corresponding undesirable effects [5].Therefore, AS has become the primary strategy for managing patients with low-or favorable intermediaterisk (FIR) PCa [6].According to AS protocols [7,8], an annual biopsy is required to determine whether patients on AS require reclassification to a higher-risk category.However, repeated biopsies increase pain and the risk of infection [9,10] and may complicate the execution of radical prostatectomy (RP) [11].
Magnetic resonance imaging (MRI) is a non-invasive imaging method that can provide high spatial resolution and overall morphological characterization of tumors [12,13].In particular, the standardized assessment method, known as the Prostate Imaging-Reporting and Data System v.2 (PI-RADSv2), has been reported to be crucial in identifying suitable AS candidates [14,15].However, the PI-RADSv2 assessment relies on a semi-quantitative interpretation of MRI images and greatly depends on the radiologist's experience, resulting in substantial variability in the assessment results among different radiologists [15][16][17].Additionally, the visual assessment by radiologists may overlook some of the non-visible information from the tumors.
Gaur [18] suggested the use of radiomics in AS for PCa.Radiomics methods can extract high-throughput features, even those not visible to the naked eye from medical images that may reflect tumor phenotypes [19][20][21] and output a quantitative score indicating the risk probability of the tumor [22].Recent studies have discovered that radiomics methods could predict the progression of AS in patients [23,24].For instance, Algohary et al. [23] developed a radiomic model to identify clinically significant PCa in patients undergoing AS.Sushentsev et al. [24] developed a radiomic model to predict the histopathological progression of PCa in patients undergoing AS.However, none of these studies identified suitable AS candidates due to limited sample sizes and the absence of independent external validation [23,24].Therefore, the current study aimed to develop and externally validate a radiomics model using a multicenter dataset to non-invasively discriminate patients with PCa who qualify for AS from those who should undergo definitive treatments, such as RP.

Patients and MRI techniques
The local Institutional Ethics Review Board approved the study and waived the requirement for written informed consent owing to its retrospective nature.This study adhered to the 1964 Declaration of Helsinki and its subsequent guidelines.Overall, 1,735 consecutive patients who underwent prostate biopsy at six hospitals between January 2018 and June 2021 were enrolled.Based on the inclusion and exclusion criteria (Fig. 1), 956 patients (166, 167, 97, 100, 316, and 110 from hospitals 1 (H1), 2 (H2), 3 (H3), 4 (H4), 5 (H5), and 6 (H6), respectively) were included in the study.All patients underwent 3.0-T MRI using an abdominal phased-array coil before prostate biopsy (Supplementary Table S1).

Biopsy analysis, PI-RADS assessment, and lesion annotation
The biopsy results for H1, H2, H4, H5, and H6 were obtained using transrectal ultrasound (TRUS)-guided systemic biopsy and MRI-guided targeted biopsy, and those for H3 were obtained using TRUS-guided saturation biopsy.At each hospital, a junior pathologist analyzed the samples, and the results were verified by a senior pathologist.Disagreements were resolved through discussions between the readers.
According to PI-RADSv2.1 [15], eight junior radiologists (JR1-8) and three experienced radiologists (ER1-3) with over 3 and 18 years of experience, respectively, participated in image interpretation.After the PI-RADS assessment, the same junior radiologist delineated the prostate lesions from the T2-weighted (T2W) images.The delineated lesion was referred to as the region of interest (ROI).The PI-RADS assessment and lesion annotation details are described in Supplementary Sect. 1.

Development and validation of the radiomics model
Figure 2 illustrates the workflow pipeline of constructing a radiomics models (e.g., eXtreme Gradient Boosting (XGBoost)).Considering the easy acquisition and abundant texture information, T2W images were used to construct the radiomics model [26,27].First, images were preprocessed (Supplementary Sect.2).Next, for each participant, 1,595 radiomics features were extracted from the ROI of the original T2W and the derived images (Supplementary Sect.3).Then, after feature selection, Fig. 2 AS candidate classification radiomics model workflow pipeline.a MR images were exported through the post-processing workstation.For the lesions on T2W images, the ROI were manually annotated slice by slice; (b) The radiomics features, including shape, texture, histogram, and filter-based features, were extracted; (c) Using a t-test, highly differentiated features were selected to distinguish AS from non-AS candidates.Then, LASSO with a five-fold cross-validation was implemented for further feature selection; (d) Using the features selected by LASSO, a radiomics model was constructed based on the traditional machine learning model (e.g., the XGBoost classifier).Two subgroup analyses were performed to further evaluate this radiomics model's performance, including distinguishing AS from the ambiguous case group and the immediate-risk group.IR PCa: Immediate-risk prostate cancer; LASSO: Least absolute shrinkage and selection operator; MR: Magnetic resonance the radiomics features that were most correlated with the classification were selected from the 1,595 radiomics features (Supplementary Sect.4).Additionally, XGBoost, logistic regression (LR), random forest (RF), adaptive boosting (AdaBoost), and decision tree (DT) classifiers were used to develop classification models based on the selected radiomic features to identify AS candidates.These radiomics models were referred to as XGBoost AS classifier (XGB-AS), LR AS classifier (LR-AS), RF AS classifier (RF-AS), AdaBoost AS classifier (AdaB-AS), and DT AS classifier (DT-AS), respectively.
A three-fold cross-center validation was conducted for each model (i.e., LR-AS, RF-AS, AdaB-AS, DT-AS, XGB-AS), with four hospitals used as a training cohort (TC) and the remaining two hospitals used as an external validation cohort (EVC) for each fold of cross-validation, ensuring that the models were multi-center trained and multi-center tested.The details in the data splitting for each fold of the three-fold cross-center were summarized in Supplementary Table S2.Specifically, for the first fold (Fold 1), patients from H1-4 (n = 530) and those from H5-6 (n = 426) were divided into TC and EVC; for the second fold (Fold 2), patients from H1, 2, 5, and 6 (n = 759) and those from H3-4 (n = 197) were divided into TC and EVC; for the third fold (Fold 3), patients from H3-6 (n = 623) and those of H1-2 (n = 333) were divided into TC and EVC.
In each fold of the three-fold cross-center validation, the models were developed using open-source packages in Python (v.3.7),including Scikit-learn and xgboost (v.1.6.2).The hyperparameters for these models were optimized using GridSearch CV.GridSearchCV is a package within the Scikit-learn library that consists of two main elements: grid search, which is used to enumerate the hyperparameters and search for the optimal ones, and cross-validation (five-fold cross-validation for the current study), which is used to assess the model's performance across different subsets of TC.
Owing to the imbalance between the number of AS and non-AS cases, the classification threshold was determined by the threshold-moving method [28], namely, n AS n AS +n non−AS , where n AS and n non−AS refer to the number of AS and non-AS cases in TC of the corresponding fold of the three-fold cross-center validation, respectively (Supplementary Table S2).Thus, if the output score of the radiomics model for a case exceeded the threshold, the case was classified into the AS group; otherwise, it was classified into the non-AS group.In agreement with clinical practice, the non-AS group (requiring immediate treatment) was designated as positive cases, and the AS group was designated as negative cases.
The means of area under the receiver operating characteristic curve (AUC), accuracy (ACC), sensitivity (SEN), and specificity (SPE) for the included radiomics models (i.e., XGB-AS, LR-AS, RF-AS, AdaB-AS, and DT-AS) across the three-fold cross-validation were calculated.AUC reflected the overall performance of the classification model without dependence on the threshold, and therefore, it was used to compare the performance of the models (i.e., XGB-AS, LR-AS, RF-AS, AdaB-AS, and DT-AS) for identifying AS candidates.

Subgroup analysis
Two subgroup analyses were conducted in EVC, using ACC to evaluate the performance of the model, as described below.
(1) Identifying AS candidates with discordance in their assessment results between the PI-RADS [15] and NCCN guidelines [8]: In clinical practice, patients with PI-RADS < 3 are not considered for biopsy due to the relatively low risk of csPCa, whereas those with PI-RADS ≥ 3 necessitate biopsy confirmation due to the relatively high risk of csPCa [15,29,30].However, taking the EVC of Fold 1 as an example, 36 patients among those with PI-RADS < 3 did not qualify for AS, according to the NCCN guidelines [8].In contrast, 42 patients among those with PI-RADS ≥ 3 (i.e., 34 patients with PI-RADS > 3 and eight patients with PI-RADS = 3) were considered suitable for AS according to the NCCN guidelines [8].Thus, for these 78 ambiguous cases, we evaluated whether XGB-AS could aid in identifying AS candidates and therefore, reducing the unnecessary biopsies.( 2

Statistical analyses
To assess the intergroup differences in the proportion of AS candidates between TC and EVC in the threefold cross-center validation, the χ

Patient characteristics
Overall, 956 patients with PCa who underwent 3.0-T MRI at six hospitals were included.S2).For the convenience of description, the XGB-AS models trained and tested in Folds 1, 2, and 3 are referred to as XGB-AS-1, XGB-AS-2, and XGB-AS-3, respectively, whose AUC, ACC, SEN and SPE were summarized in Table 3.As indicated in Table 3, the AUC of XGB-AS-2 is slightly higher than that of XGB-AS-1, which is much higher than that of XGB-AS-3.The detailed optimal hyperparameters of XGB-AS-1, XGB-AS-2, and XGB-AS-3 were summarized in Supplementary Table S3.Additionally, as indicated in Table 3, there is no significant difference in the proportion of AS candidates between TC and EVC for Fold 1 (P = 0.069) or Fold 2 (P = 0.259) of the three-fold cross-center validation.In contrast, such difference is significant for Fold 3 of the three-fold cross-center validation (P = 0.0043), which may be one of the reasons for the decrease in the performance of XGB-AS-3.Thus, to minimize the bias resulting from the patients splitting during the three-fold crosscenter validation, the model with the median performance according to AUC (i.e., XGB-AS-1) was selected as the most clinically applicable model, which was used for the subgroup analyses in the corresponding EVC to further validate its clinical performance.

Feature analysis
Table 4 summarizes the features selected for the development of XGB-AS-1.As indicated in Table 4, only three categories of features (i.e., one original feature, nine wavelet features, and three local binary pattern in    3D (LBP-3D) features) were selected by the feature selection process to develop XGB-AS-1 (Supplementary Figure S1).Among these features, when comparing the AS group to the non-AS group, four features (including one original feature, two wavelet features, and one LBP-3D feature) had significantly higher values for the AS group, whereas the remaining nine features (consisting of seven wavelet features and two LBP-3D features) demonstrated significantly lower values for the AS group (P < 0.05).

Discussion
In this study, a radiomics model based on MRI was developed and externally validated to discriminate between AS and non-AS candidates.The results indicated that XGB-AS demonstrated promising performance in identifying AS candidates.According to AS protocols [7,8], patients on AS must periodically undergo repeat biopsies to determine whether they can continue to follow AS.However, frequent biopsies lead to side effects such as bleeding and infection [9,10], and a particularly difficult implementation of RP [11].Furthermore, XGB-AS accurately identified an average of 84.1% of AS candidates.If XGB-AS had been utilized previously, patients with PCa could have avoided unnecessary biopsies, the risk of overtreatment, and potentially challenging RP.Thus, XGB-AS can serve as a primary non-invasive categorization tool, assisting in the accurate identification of AS candidates and avoiding the detrimental effects of repeated biopsies.
In terms of identifying patients with PCa who required biopsy confirmation, XGB-AS displayed better performance than the PI-RADS assessment conducted by experienced radiologists.Moreover, disagreement exists regarding whether biopsy confirmation is required between the PI-RADS assessment [15] and NCCN guidelines [8].The proposed XGB-AS-1 accurately identified 78.6% (33/42) of AS candidates and 55.6% (20/36) of non-AS candidates from ambiguous cases, with discordance in the assessment results between the PI-RADS assessment and the NCCN guidelines (i.e., the reference standard of the current study).Thus, when patients with PCa were assessed using MRI, our model effectively reduced unnecessary biopsies and enhanced detection SEN and SPE.Therefore, our model may be a potential tool to aid radiologists in the risk stratification of PCa based on non-invasive MRI images.According to clinical practice guidelines [8], an invasive biopsy is necessary for risk stratification of patients with FIR and UFIR.However, utilizing XGB-AS-1, 83.8% (31/37) of the FIR patients who were suitable AS candidates and 54.2% (26/48) of the UFIR patients who were non-AS candidates were correctly identified using MRI.Thus, unnecessary biopsies can be avoided in these patients  4 The difference in the value of features selected by the feature selection for the development of XGB-AS-1 between AS and non-AS candidates in the corresponding TC and early detection can be achieved.These results further underscore the capability of the proposed XGB-AS-1 model to discern subtle differences between AS and non-AS candidates, thereby aiding in identifying AS candidates based on MRI.Overall, 13 radiomics features that exhibited significant differences in feature values between AS and non-AS candidates were included in XGB-AS-1.Among them, the original _shape_ sphericity was greater for AS than for non-AS.This feature measures the similarity between the shapes of a lesion and a sphere.Thus, our findings suggest that lesions in AS candidates exhibit a more regular shape than those in non-AS candidates.Similar to our findings, Wang et al. [31] reported that the original _shape_ sphericity of adrenal lipid-poor adenomas was greater than that of adrenal metastases.This original_ shape_ sphericity was calculated from the original T2W images rather than from their derived images (e.g., wavelet images).Thus, differences in the original_ shape_ sphericity can provide radiologists and urologists with direct visual and semantic information to determine AS.Additionally, the selected features included wavelet and LBP-3D features, consistent with recent radiomics studies that reported a relationship between these features and tumor progression, as observed in Hodgkin lymphoma [32], cervical cancer [33], and meningioma [34].Unlike original_ shape_ sphericity, these features are quantified from the derived images and, hence, are not visually represented.However, they comprise most of the selected radiomics features and encompass substantial subtle and invisible information capable of quantitatively characterizing the heterogeneity of PCa.Consequently, they play an important role in the identification of AS candidates.
This study had three limitations.First, although the proposed model was tested using the EVC, the study was retrospective.Future studies should validate and broaden our findings by using prospective data.Second, multicenter cases were manually segmented, which was time-consuming.An automatic segmentation algorithm would be beneficial for future studies.Third, the performance of XGB-AS-1 in ambiguous and intermediate-risk cases is not excellent; perhaps, a more advanced model (i.e., a deep-learning model) has the potential to stratify them accurately.However, the number of ambiguous and intermediate-risk cases in the current dataset was relatively small, rendering it insufficient to train and validate a deep-learning model.Further studies should develop more advanced models with a large amount of data, owing to the clinical significance of the risk re-stratification of ambiguous and intermediate-risk patients.

Conclusions
In conclusion, the proposed radiomics model demonstrated promising performance in identifying candidates for AS, particularly in the classification of AS and non-AS candidates among the patients with PCa considered intermediate risk and those misclassified by the PI-RADS assessment.These findings suggest that the XGB-AS model has the potential to help identify patients who are suitable for AS and allow non-invasive monitoring of patients with AS, thereby reducing the number of annual biopsies and the associated risks of bleeding and infection.

Fig. 1
Fig. 1 Overview of patients based on the inclusion and exclusion criteria and allocation of patients in the training and external validation cohorts.cT stage: clinical tumor stage; DCE: Dynamic contrast-enhanced; DWI: Diffusion-weighted imaging; mpMRI: Multiparametric magnetic resonance imaging; PSA: Prostate-specific antigen; T2WI: T2-weighted imaging Bold numbers represent the highest values of the performance metrics in each column.The P-value was calculated using a χ 2 test to compare the AS% between the TC and EVC for the corresponding fold of cross-center validation XGB-AS radiomics model based on the eXtreme Gradient Boosting classifier for identifying AS candidates; AS% the proportion of active surveillance candidates; XGB-AS-1 the XGB-AS model developed in the first fold of the three-fold cross-center validation; XGB-AS-2 the XGB-AS model developed in the second fold of the three-fold cross-center validation; XGB-AS-3 the XGB-AS model developed in the third fold of the three-fold cross-center validation

Fig. 3 Fig. 4
Fig. 3 Receiver operating characteristic curves for comparisons between the XGB-AS-1 and PI-RADS performed by experienced radiologists

Table 2
AdaB-AS presented much lower SPE (0.539 vs 0.841) and SEN (0.491 vs 0.668) when compared to XGB-AS, respectively.These results indicate that XGB-AS exhibits better comprehensive performance in identifying AS candidates than the other models.

Table 1
Descriptive characteristics and distribution of AS and non-AS candidates from six hospitals IQR Interquartile range

Table 2
Mean performance of the included radiomics models for identifying AS candidates across three-fold cross-center validation Bold numbers represent the highest values of the performance metrics in each column LR-AS Radiomics model based on the Logistic Regression classifier for identifying AS candidates, RF-AS Radiomics model based on the Random Forest classifier for identifying AS candidates, AdaB-AS Radiomics model based on the Adaptive Boosting classifier for identifying AS candidates, DT-AS Radiomics model based on the Decision Tree classifier for identifying AS candidates, XGB-AS Radiomics model based on the eXtreme Gradient Boosting classifier for identifying AS candidates

Table 3
Performance of the XGB-AS model developed in each fold of threefold cross-centre validation